We address the problem of localisation of objects as bounding boxes in imageswith weak labels. This weakly supervised object localisation problem has beentackled in the past using discriminative models where each object class islocalised independently from other classes. We propose a novel framework basedon Bayesian joint topic modelling. Our framework has three distinctiveadvantages over previous works: (1) All object classes and image backgroundsare modelled jointly together in a single generative model so that "explainingaway" inference can resolve ambiguity and lead to better learning andlocalisation. (2) The Bayesian formulation of the model enables easyintegration of prior knowledge about object appearance to compensate forlimited supervision. (3) Our model can be learned with a mixture of weaklylabelled and unlabelled data, allowing the large volume of unlabelled images onthe Internet to be exploited for learning. Extensive experiments on thechallenging VOC dataset demonstrate that our approach outperforms thestate-of-the-art competitors.
展开▼